Markovian language model of the DNA and its information content

نویسندگان

  • S. Srivastava
  • M. S. Baptista
چکیده

This work proposes a Markovian memoryless model for the DNA that simplifies enormously the complexity of it. We encode nucleotide sequences into symbolic sequences, called words, from which we establish meaningful length of words and groups of words that share symbolic similarities. Interpreting a node to represent a group of similar words and edges to represent their functional connectivity allows us to construct a network of the grammatical rules governing the appearance of groups of words in the DNA. Our model allows us to predict the transition between groups of words in the DNA with unprecedented accuracy, and to easily calculate many informational quantities to better characterize the DNA. In addition, we reduce the DNA of known bacteria to a network of only tens of nodes, show how our model can be used to detect similar (or dissimilar) genes in different organisms, and which sequences of symbols are responsible for most of the information content of the DNA. Therefore, the DNA can indeed be treated as a language, a Markovian language, where a 'word' is an element of a group, and its grammar represents the rules behind the probability of transitions between any two groups.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

Randomized Algorithm For 3-Set Splitting Problem and it's Markovian Model

In this paper we restrict every set splitting problem to the special case in which every set has just three elements. This restricted version is also NP-complete. Then, we introduce a general conversion from any set splitting problem to 3-set splitting. Then we introduce a randomize algorithm, and we use Markov chain model for run time complexity analysis of this algorithm. In the last section ...

متن کامل

Monte Carlo Simulation to Compare Markovian and Neural Network Models for Reliability Assessment in Multiple AGV Manufacturing System

We compare two approaches for a Markovian model in flexible manufacturing systems (FMSs) using Monte Carlo simulation. The model which is a development of Fazlollahtabar and Saidi-Mehrabad (2013), considers two features of automated flexible manufacturing systems equipped with automated guided vehicle (AGV) namely, the reliability of machines and the reliability of AGVs in a multiple AGV jobsho...

متن کامل

Noospheric Psychological-Educational Paradigm as a Methodological Basis for Teaching Russian-Language Business Communication to Foreign Students

In the context of the polyparadigmatic system of higher education, the noospheric psychological-pedagogical paradigm is considered, on its basis a lingvodidactic model is developed for the formation of professional-communicative competence (PCC) in Russian-language business communication among foreign students. The research focuses on the basic principles of the noospheric paradigm, which procl...

متن کامل

Attitudes towards English as an International Language (EIL) in Iran: Development and Validation of a New Model and Questionnaire

This study aimed at developing and validating a new model and instrument to explore attitudes of Iranian EFL learners towards English as an International Language (EIL). In so doing, the researchers followed several rigorous steps including extensive literature review, content selection, item generation, designing the rating scales and personal information part, Delphi technique, item revision,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2016